Efficient Learning of Multi-step Best Response
ثبت نشده
چکیده
We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner’s Dilemma game shows that even when no extra domain knowledge (besides that the opponent’s memory size is known) is assumed, the error can still be small.
منابع مشابه
Efficient Solution of Nonlinear Duffing Oscillator
In this paper, the efficient multi-step differential transform method (EMsDTM) is applied to get the accurate approximate solutions for strongly nonlinear duffing oscillator. The main improvement of EMsDTM which is to reduce the number of arithmetic operations, is thoroughly investigated and compared with the classic multi-step differential transform method (MsDTM). To illustrate the applicabil...
متن کاملCrashworthiness design of multi-cell tapered tubes using response surface methodology
In this article, crashworthiness performance and crushing behavior of tapered structures with four internal reinforcing plates under axial and oblique dynamic loadings have been investigated. These structures have a tapered form with five cross sections of square, hexagonal, octagonal, decagon and circular shape. In the first step, finite element simulations performed in LS-DYNA were validated ...
متن کاملA Chance Constraint Approach to Multi Response Optimization Based on a Network Data Envelopment Analysis
In this paper, a novel approach for multi response optimization is presented. In the proposed approach, response variables in treatments combination occur with a certain probability. Moreover, we assume that each treatment has a network style. Because of the probabilistic nature of treatment combination, the proposed approach can compute the efficiency of each treatment under the desirable reli...
متن کاملMulti-Objective Optimization of Demand Side Management and Multi DG in the Distribution System with Demand Response
The optimal management of distributed generation (DG) enhances the efficiency of the distribution system; On the other hand, increasing the interest of customers in optimizing their consumption improves the performance of DG. This act is called demand side management. In this study, a new method based on the intelligent algorithm is proposed to optimal operate the demand side management in the ...
متن کاملA multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images
The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...
متن کامل